The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The efficient parallelisation of the finite element method is based on non-overlapping partitioning of the computational domain into an appropriate number of subdaomins. The problem size required for efficient application of parallel solution techniques is considerably large. The problem description in terms of finite element nodes and elements is complicated and difficult to handle with respect to...
Parallel quicksort is known as a very scalable parallel sorting algorithm. However, most actual implementations have been limited to basic versions which suffer from irregular communication patterns and load imbalance. We have implemented a high performance variant of parallel quicksort which incorporates the following optimizations: Stop the recursion at the right time, sort locally first, use accurate...
Parallel list ranking is a hard problem due to its extreme degree of irregularity. Also because of its linear sequential complexity, it requires considerable effort to just reach speed-up one (break even). In this paper, we address the question of how to solve the list-ranking problem for lists of length up to 2·108 in practice: we consider implementations on the Intel Paragon, whose PUs are laid-out...
A distributed algorithm is presented for generating all maximal cliques in a network graph, based on the sequential version of Tsukiyama et al. [TIAS77]. The time complexity of the proposed approach is restricted to the induced neighborhood of a node, and the communication complexity is O(md) where m is the number of connections, and d is the maximum degree in the graph. Messages are O(log n) bits...
In this paper we present a probabilistic model for analysing trees generated during problem solving using best-first search Branch and Bound algorithm. We consider the weight associated with the edges as well as the number of children in a node to be random. The expressions obtained for the sequential algorithm are applied to several parallel strategies using a multiprocessor with distributed memory.
Cilk (pronounced “silk”) is a C-based language for multithreaded parallel programming. Cilk makes it easy to program irregular parallel applications, especially as compared with data-parallel or message-passing programming systems. A Cilk programmer need not worry about protocols and load balancing, which are handled by Cilk's provably efficient runtime system. Many regular and irregular Cilk applications...
Starting from a specific implementation of the Lanczos biorthogonalization algorithm, an iterative process for the solution of systems of linear equations with general non-Hermitian coefficient matrix is derived. Due to the orthogonalization of the underlying Lanczos process the resulting iterative scheme involves inner products leading to global communication and synchronization on parallel processors...
For the solutions of linear systems of equations with unsymmetric coefficient matrices, we has proposed an improved version of the quasi-minimal residual (IQMR) method by using the Lanczos process as a major component combining elements of numerical stability and parallel algorithm design. For Lanczos process, stability is obtained by a couple two-term procedure that generates Lanczos vectors scaled...
We present a programming system based around a set of highly concurrent, highly distributed data structures. These shared abstract data types offer a uniform interface onto a set of possible implementations, each optimised for particular patterns of use and more relaxed coherence conditions in the data. They allow applications written using a shared data model to approach the performance of message...
Performing runtime parallelization on general networks of workstations (NOWs) without special hardware or system software supports is very difficult, especially for DOACROSS loops. With the high communication overhead on NOWs, there is hardly any performance gain for runtime parallelization, due to the latter's large amount of messages for dependence detection, data accesses, and computation scheduling...
We study a distributed load balancing problem on arbitrary graphs. First Order (FO) and Second Order (SO) schemes are popular local diffusive schedules for this problem. To use them, several parameters have to be chosen carefully. Determining the “optimal” parameters analytically is difficult, and on a practical level, despite the widespread use of these schemes, little is known on how relevant parameters...
Efficient scheduling of parallel tasks onto processing elements of concurrent computer systems has been an important research issue for decades. The communication overhead often limits the speedup of parallel programs in distributed systems. Duplication Based Scheduling (DBS) has been proposed to reduce the communication overhead by duplicating remote parent tasks on local processing elements. The...
In this paper we study the problem of register allocation in the presence of parallel conditional branches with a given branching depth d. We start from a scheduled flow graph and the goal is to find an assignment of the variables in the flow graph to a minimum number of registers. This problem can be solved by coloring the corresponding conflict graph G=(V, E). We describe a new approximation algorithm...
We consider the following classical resource constrained scheduling problem. Given m identical processors, s resources R1,⋯, R3 with upper bounds b1,⋯, b3, n independent jobs T1,⋯, Tn of unit length, where each job requires one processor and an amount Ri(j) ∈ 0,1 of resource Ri, i=1,⋯, s, the optimization problem...
The Virtual Data Space is a standard C-library which automatically distributes the work-packets generated by parallel applications across the processing nodes. VDS is a universal system offering loadbalancing-mechanisms for applications which incorporate independent load-items and scheduling algorithms for those which comprise precedence-constraints...
We address the problem of improving the data cache performance of numerical applications — specifically, those with blocked (or tiled) loops. We present DAT, a data alignment technique utilizing array-padding, to improve program performance through minimizing cache conflict misses. We describe algorithms for selecting tile sizes for maximizing data cache utilization, and computing pad sizes for eliminating...
This paper presents SUPPLE (SUPport for Parallel Loop Execution), an innovative run-time support for parallel loops with regular stencil data references and non-uniform iteration costs. SUPPLE relies upon a static block data distribution to exploit locality, and combines static and dynamic policies for scheduling non-uniform iterations. It adopts, as far as possible, a static scheduling policy derived...
For applications involving large data sets yielding variablecost computations, achieving both efficient I/O and load balancing may become particularly challenging though performance-critical tasks. In this work, we introduce a data scheduling approach that integrates several optimizing techniques, including dynamic allocation, prefetching, and asynchronous I/O and communications. We show that good...
Stochastic modeling forms the basis for analysis in many areas, including biological and economic systems, as well as the performance and reliability modeling of computers and communication networks. One common approach is the state-space-based technique, which, starting from a high-level model, uses depth-first search to generate both a description of every possible state of the model and the dynamics...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.